Search CORE

8 research outputs found

Stopword detection for streaming content

Author: Al-Obeidat Feras
Bagheri Ebrahim
Bashari Masoud
Fani Hossein
Zarrinkalam Fattane
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

© Springer International Publishing AG, part of Springer Nature 2018. The removal of stopwords is an important preprocessing step in many natural language processing tasks, which can lead to enhanced performance and execution time. Many existing methods either rely on a predefined list of stopwords or compute word significance based on metrics such as tf-idf. The objective of our work in this paper is to identify stopwords, in an unsupervised way, for streaming textual corpora such as Twitter, which have a temporal nature. We propose to consider and model the dynamics of a word within the streaming corpus to identify the ones that are less likely to be informative or discriminative. Our work is based on the discrete wavelet transform (DWT) of word signals in order to extract two features, namely scale and energy. We show that our proposed approach is effective in identifying stopwords and improves the quality of topics in the task of topic detection

ZU Scholars (Zayed University)

Mining User Interests from Social Media

Author: Abel Fabian
Kapanipathi Pavan
Kapanipathi Pavan
Narducci Fedelucio
Piao Guangyuan
Zarrinkalam Fattane
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Social media users readily share their preferences, life events, sentiment and opinions, and implicitly signal their thoughts, feelings, and psychological behavior. This makes social media a viable source of information to accurately and effectively mine users' interests with the hopes of enabling more effective user engagement, better quality delivery of appropriate services and higher user satisfaction. In this tutorial, we cover five important aspects related to the effective mining of user interests: (1) the foundations of social user interest modeling, such as information sources, various types of representation models and temporal features, (2) techniques that have been adopted or proposed for mining user interests, (3) different evaluation methodologies and benchmark datasets, (4) different applications that have been taking advantage of user interest mining from social media platforms, and (5) existing challenges, open research questions and exciting opportunities for further work

MURAL - Maynooth University Research Archive Library

Crossref

NUI Maynooth Eprint Archive

Maynooth University ePrints and eTheses Archive

Archivio della ricerca- Università di Roma La Sapienza

Learning heterogeneous subgraph representations for team discovery

Author: Al-Obeidat Feras
Bagheri Ebrahim
Hamidi Rad Radin
Kargar Mehdi
Nguyen Hoang
Srivastava Divesh
Szlichta Jaroslaw
Zarrinkalam Fattane
Publication venue: ZU Scholars
Publication date: 01/12/2023
Field of study

The team discovery task is concerned with finding a group of experts from a collaboration network who would collectively cover a desirable set of skills. Most prior work for team discovery either adopt graph-based or neural mapping approaches. Graph-based approaches are computationally intractable often leading to sub-optimal team selection. Neural mapping approaches have better performance, however, are still limited as they learn individual representations for skills and experts and are often prone to overfitting given the sparsity of collaboration networks. Thus, we define the team discovery task as one of learning subgraph representations from a heterogeneous collaboration network where the subgraphs represent teams which are then used to identify relevant teams for a given set of skills. As such, our approach captures local (node interactions with each team) and global (subgraph interactions between teams) characteristics of the representation network and allows us to easily map between any homogeneous and heterogeneous subgraphs in the network to effectively discover teams. Our experiments over two real-world datasets from different domains, namely DBLP bibliographic dataset with 10,647 papers and IMDB with 4882 movies, illustrate that our approach outperforms the state-of-the-art baselines on a range of ranking and quality metrics. More specifically, in terms of ranking metrics, we are superior to the best baseline by approximately 15 % on the DBLP dataset and by approximately 20 % on the IMDB dataset. Further, our findings illustrate that our approach consistently shows a robust performance improvement over the baselines

ZU Scholars (Zayed University)

SemCiR

Author: Fattane Zarrinkalam
Mohsen Kahani
Publication venue: 'Emerald'
Publication date
Field of study

Crossref

Neural embedding-based specificity metrics for pre-retrieval query performance prediction

Author: Al-Obeidat Feras
Arabzadeh Negar
Bagheri Ebrahim
Jovanovic Jelena
Zarrinkalam Fattane
Publication venue: 'Elsevier BV'
Publication date: 01/07/2020
Field of study

© 2020 Elsevier Ltd In information retrieval, the task of query performance prediction (QPP) is concerned with determining in advance the performance of a given query within the context of a retrieval model. QPP has an important role in ensuring proper handling of queries with varying levels of difficulty. Based on the extant literature, query specificity is an important indicator of query performance and is typically estimated using corpus-specific frequency-based specificity metrics However, such metrics do not consider term semantics and inter-term associations. Our work presented in this paper distinguishes itself by proposing a host of corpus-independent specificity metrics that are based on pre-trained neural embeddings and leverage geometric relations between terms in the embedding space in order to capture the semantics of terms and their interdependencies. Specifically, we propose three classes of specificity metrics based on pre-trained neural embeddings: neighborhood-based, graph-based, and cluster-based metrics. Through two extensive and complementary sets of experiments, we show that the proposed specificity metrics (1) are suitable specificity indicators, based on the gold standards derived from knowledge hierarchies (Wikipedia category hierarchy and DMOZ taxonomy), and (2) have better or competitive performance compared to the state of the art QPP metrics, based on both TREC ad hoc collections namely Robust’04, Gov2 and ClueWeb’09 and ANTIQUE question answering collection. The proposed graph-based specificity metrics, especially those that capture a larger number of inter-term associations, proved to be the most effective in both query specificity estimation and QPP. We have also publicly released two test collections (i.e. specificity gold standards) that we built from the Wikipedia and DMOZ knowledge hierarchies

ZU Scholars (Zayed University)

User behavior mining on social media: a systematic literature review

Author: A Almars
A Guille
A Sheikhahmadi
A Vespignani
A-L Barabasi
AJ Slaughter
Amir Masoud Rahmani
Andreas M. Kaplan
B Baingana
B Golub
B Gonçalves
B Karrer
Bo Jiang
Brandon Oselio
C Jiang
C Liu
Chao Tong
D Lazer
DA Locher
Dawei Jin
DJ Watts
E Colleoni
E Lahuerta-Otero
Everett M. Rogers
F Riquelme
F Wu
F Zarrinkalam
Fattane Zarrinkalam
FS Pereira
G Kossinets
G Miritello
G Szabo
H Yin
H-N Kim
Hao Wang
HU Khan
I Lobel
J Han
J Kang
J Li
J Tang
J Wan
JJ Jones
K Saito
K Sneppen
K Zhuang
Kevin S. Xu
L Backstrom
L Cao
L Liu
L Rashotte
L Weng
Lei Meng
Liang Liu
LK Gallos
LM Aiello
M Cha
M Conover
M Deutsch
M Gomez-Rodriguez
M Granovetter
M Hosseini-Pozveh
M Karsai
M McPherson
Maryam Hosseini-Pozveh
ME Newman
MK Sohrabi
MO Jackson
MS Granovetter
Naveen Kumar Sharma
P Brereton
P Wang
PW Holland
Q Sun
Qing Bao
Rahebeh Mojtahedi Safari
RL Fogués
S Aral
S Wuchty
Sasan H. Alizadeh
Shaozhi Ye
Siyu Tang
SP Borgatti
T Milenković
T Xu
V Garousi
Vincent D Blondel
Y Halberstam
Yinxue Yi
Yunpeng Xiao
Z Zhu
Zhiguo Zhu
ZZ Alp
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref